Goto

Collaborating Authors

 figure 1


Forecast collapse of transformer-based models under squared loss in financial time series

Andreoletti, Pierre

arXiv.org Machine Learning

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.


Binary Expansion Group Intersection Network

Zhou, Sicheng, Zhang, Kai

arXiv.org Machine Learning

Conditional independence is central to modern statistics, but beyond special parametric families it rarely admits an exact covariance characterization. We introduce the binary expansion group intersection network (BEGIN), a distribution-free graphical representation for multivariate binary data and bit-encoded multinomial variables. For arbitrary binary random vectors and bit representations of multinomial variables, we prove that conditional independence is equivalent to a sparse linear representation of conditional expectations, to a block factorization of the corresponding interaction covariance matrix, and to block diagonality of an associated generalized Schur complement. The resulting graph is indexed by the intersection of multiplicative groups of binary interactions, yielding an analogue of Gaussian graphical modeling beyond the Gaussian setting. This viewpoint treats data bits as atoms and local BEGIN molecules as building blocks for large Markov random fields. We also show how dyadic bit representations allow BEGIN to approximate conditional independence for general random vectors under mild regularity conditions. A key technical device is the Hadamard prism, a linear map that links interaction covariances to group structure.





50a074e6a8da4662ae0a29edde722179-AuthorFeedback.pdf

Neural Information Processing Systems

In order to help clarify our contributions and or-2 ganize them for readers, we provide the following table to summarize the differences between regrets.3 REVIEWER 4 Thank you for your comments. Concept drift occurs when the optimal model attimetmay no longer bethe optimal model10 at timet+1. Consider an online learning problem with concept drift withT = 3 time periods and loss functions:11 f1(x) = (x 1)2,f2(x) = (x 2)2,f3(x) = (x 3)2. Figure 1: SGD online with momentum Theoretical motivation via Calibration: A more formal motivation of our regret23 can be related to the concept of calibration [1]. The comment on line 110 can be24 rewritten as: If the updates{x1,,xT} are well-calibrated, then perturbingxt by25 anyucannot substantially reduce the cumulative loss.Hence, itcan besaid that the26 sequence {x1,,xT} is asymptotically calibrated with respect to{f1,,fT} if:27 Weindeedranexperiments usingSGDwithmomentum forvariousdecayparameters andconcluded thatSGDwith36 momentum is not even as stable as SGD-online (standard SGD without momentum) as shown in Figure 1.